HBASE-29081: Add HBase Read Replica Cluster feature#8044
HBASE-29081: Add HBase Read Replica Cluster feature#8044
Conversation
(cherry picked from commit 7ab9d52)
* HBASE-29083: Add global read-only mode to HBase Add hbase read-only property and ReadOnlyController (cherry picked from commit 49b678d) * HBASE-29083. Allow test to update hbase:meta table * HBASE-29083. Spotless apply * Refactor code to have only passing tests * Apply spotless --------- Co-authored-by: Andor Molnar <andor@cloudera.com>
… Level (#6931) Co-authored-by: Andor Molnar <andor@cloudera.com> Co-authored-by: Anuj Sharma <sharma.anuj1991@gmail.com>
Change-Id: Ia04bb12cdaf580f26cb14d9a34b5963105065faa
* CDPD-84463 Add ruby shell commands for refresh_hfiles * [CDPD-84466] Add hbase-client API code to refresh_hfiles * CDPD-84465 Add protobuf messages for refresh_hfiles * Add refreshHfile function in master rpc service and make call to its function * CDPD-82553 Add function in Region Server to refresh Hfiles * Add nonceGroup and nonce for the Master RPC request * Refactor code with proper name for function * Add region Server Procedure and callables * Remove the refreshHFiles function which was intended to call as RS RPC As we will be calling it through procedure framework * Remove the unwanted comments * Add line mistakenly removed in admin.proto * Correct the wrong comment in Event Types * Apply Spotless * Address the review comments having small code changes * Add separate function for master service caller * Add retry mechanism for refresh_hfiles, send exception if retry threshold get breached Also handle scenario in case the region is not online * Add tablename into RefreshHFilesTableProcedureStateData * CDPD-88507, CDPD-88508 Add procdure suspport for namespace as parameter and no parameter * nit: Add meaningful name to method and remove comments * Return exception if user is updating system table or reserved namespaces * Send exception if tablename or namespace is invalid Also remove redundant TODOs * Add gatekeeper method to prevent execution because command before master initialize * Return exception in case both TABLE_NAME and NAMESPACE is provided in arguments * Run Spotless * Add unit tests for refreshHfiles Procedure and admin calls * Make the newly added HFiles available for reading immediately * Revert "Make the newly added HFiles available for reading immediately" This reverts commit c25cc9a. * Address review comments * Create test base class to avoid code duplication * Add integration test which enable readonly mode before refresh * Added test rule and rebased the upstream * Apply spotless
…'s meta table after HbckChore run (#7304)
#7325) * HBASE-29597 Supply meta table name for replica to the tests in TestMetaTableForReplica class * HBASE-29597 Supply meta table name for replica to the tests in TestMetaTableForReplica class
…filelist is n… (#7361) * HBASE-29611: With FILE based SFT, the list of HFiles we maintain in .filelist is not getting updated for read replica Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29611 Description: Steps to Repro (For detailed steps, check JIRA): - Create two clusters on the same storage location. - Create table on active, then refresh meta on the read replica to get the table meta data updated. - Add some rows and flush on the active cluster, do refresh_hfiles on the read replica and scan table. - If you now again add the rows in the table on active and do refresh_hfiles then the rows added are not visible in the read replica. Cause: The refresh store file is a two step process: 1. Load the existing store file from the .filelist (choose the file with higher timestamp for loading) 2. refresh store file internals (clean up old/compacted files, replace store file in .filelist) In the current scenario, what is happening is that for the first time read-replica is loading the list of Hfiles from the file in .filelist created by active cluster but then it is creating the new file with greater timestamp. Now we have two files in .filelist. On the subsequent flush from active the file in .filelist created by the active gets updated but the file created by read-replica is not. While loading in the refresh_hfiles as we take the file with higher timestamp the file created by read-replica for the first time gets loaded which does not have an updated list of hfiles. Fix: As we just wanted the file from active to be loaded anytime we perform refresh store files, we must not create a new file in the .filelist from the read-replica, in this way we will stop the timestamp mismatch. NOTE: Also we don't want to initialize the tracker file (StoreFileListFile.java:load()) from read-replica as we are not writing it hence we have added check for read only property in StoreFileTrackerBase.java:load() * Make read only cluster to behave like secondary replica
Link to JIRA: https://issues.apache.org/jira/browse/HBASE-29644 Description: Consider the two cluster setup with one being active and one read replica. If active cluster create a table with FILE based SFT. If you add few rows through active and do flushes to create few Hfiles and then do refresh_meta from read replica its triggering minor compaction. Which should not happen via read replica, it may create inconsitencies because active is not aware of that event. Cause: This is happening because we should block the compaction event in ReadOnlyController but we missed adding read only guard to preCompactSelection() function. Fix: Add internalReadOnlyGuard to preCompactSelection() in ReadOnlyController
#7437) * HBASE-29642 Active cluster file is not being updated after promoting a new active cluster * HBASE-29642 Active cluster file is not being updated after promoting a new active cluster * HBASE-29642 Active cluster file is not being updated after promoting a new active cluster
…y controller (#7464) * HBASE-29693: Implement the missing observer functions in the read-only controller * Remove setter method to set read-only configuration
…r's tables before refreshing meta and hfiles (#7474) Signed-off-by: Tak Lon (Stephen) Wu <taklwu@apache.org> Signed-off-by: Andor Molnár <andor@apache.org> Reviewed by: Kota-SH <shanmukhaharipriya@gmail.com>
…de (#7554) * HBASE-29778: Abort the retry operation if not allowed in read-only mode Currenly, if we discover that the operation is not allowed in Read-Only Mode then we are sending exception, but the context does not get aborted leading to multiple same exceptions gets thrown. The real reason this is happening because we are sending IOException hence client retries same operation which is causing multiple similar exception. If we abort then it can lead to RS instability or corruption and using context.bypass will lead directly go to perform operation directly instead of aborting it, hence safer is to use DoNotRetryIOException. * Commit to rerun the job
…able (#7555) * HBASE-29779: Call super coprocessor instead of returning for system tables * Instead of system table just allow operation for hbase meta only There are some other system tables such as acl or namespace which are shared with active cluster hence allowing operation with them in readonly cluster will make system inconsistent. * Invert the methods name and add negation to caller * Instead of static variable comparison use the API from TableName class This is done do avoid any conflicts after the changes in HBASE-29691: Change TableName.META_TABLE_NAME from being a global static
…t uses the filesystem (#7702) Change-Id: I776f956c830a7f4671cfae265269a21fa61d0bdf
…trollers (#7661) * HBASE-29841: Split bulky ReadOnlyController into multiple smaller controllers Currently we have created a single ReadOnlyController which needs to get added as coprocessor for master, region and region server. In this task we will be breaking ReadOnlyController into multiple smaller controller to avoid unnecessarily adding methods which are not relevant for particular role for example, master copocessor should only register methods which may run on master and not on region or region server. * Addres review Comments
…itialization (#7743) * HBASE-29756: Programmatically register related co-processor during initialization * Apply Spotless * Remove the cached globalReadOnlyMode variable and make manageclusterIDFile static * Address review comments * Address review comments * Make coprocessor addition and removal generic * Make manageClusterIdFile Idempotent * Address review comments * Avoid intelliJ warning about fixed size array creation
…ix file during startup (#7881) * HBASE-29959 Cluster started in read-only mode mistakenly deletes suffix file during startup * Move log message to if block. * Close file input stream * Change the getter which does not mutate the suffix data
…riter on secondary replicas or in read-only mode (#7920) * Remove unused variable * HBASE-29960 java.lang.IllegalStateException: Should not call create writer on secondary replicas or in read-only mode
* HBASE-29958 Improve log messages * Address review comments * Update hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com> * Update hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com> * Update hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com> * HBASE-29961 Secondary cluster is unable to replayWAL for meta (#7854) * Add <blank> when no suffix provided * Address few review comments * HBASE-29958. Refactor ActiveClusterSuffix to use protobuf, refactor logging * HBASE-29958. Remove more redundant logic, test cleanup * HBASE-29958. Spotless apply * HBASE-29958. Revert mistake * HBASE-29958 Improve log messages * Address Kevin's review comment to address multiple : in active cluster suffix * As getClusterSuffixFromConfig() changed we need to change the code for file deletion * Use ActiveClusterSuffix object based comparison instead of byte Array comparison --------- Co-authored-by: Kota-SH <shanmukhaharipriya@gmail.com> Co-authored-by: Andor Molnar <andor@cloudera.com>
* HBASE-29965: Unable to dynamically change readonly flag Change-Id: I5b5479e37921ea233f586f0f02d2606320e16139 * Refactor repeated code Change-Id: I9a0269b786f7282686d60ceff47a538d2b0b88fa * Add docstrings Change-Id: I3b456e0b2689dfad09d1f5a4b47fe8fd85d06bf9
…ng in FSUtils (#8006) * HBASE-29993. Refactor cluster id and suffix in FSUtils * HBASE-29993. Spotless apply * HBASE-29993. Renaming * Fix typo Co-authored-by: Kevin Geiszler <kgeiszler@cloudera.com> * HBASE-29993. Spotless apply --------- Co-authored-by: Kevin Geiszler <kgeiszler@cloudera.com>
| } | ||
|
|
||
| protected void internalReadOnlyGuard() throws DoNotRetryIOException { | ||
| throw new DoNotRetryIOException("Operation not allowed in Read-Only Mode"); |
There was a problem hiding this comment.
I think it would be nice to subclass DoNotRetryIOException with something more specific to this situation, like WriteAttemptedOnReadOnlyClusterException
There was a problem hiding this comment.
Noted. Will update code accordingly.
| } catch (IOException ioe) { | ||
| LOG.warn("Exception while trying to refresh store files: ", ioe); | ||
| } |
There was a problem hiding this comment.
Could you talk about the decision to swallow the error here? I'm on the fence if that is the right choice.
| LOG.debug("Scanning namespace {}", namespacePath.getName()); | ||
| List<Path> tableDirs = FSUtils.getLocalTableDirs(fs, namespacePath); | ||
|
|
||
| return tableDirs.parallelStream().flatMap(tableDir -> { |
There was a problem hiding this comment.
Delegating into the common ForkJoinPool feels dicey here. I would feel safer if this was a regular stream().
| CoprocessorConfigurationUtil.checkConfigurationChange(this.cpHost, newConf, | ||
| CoprocessorHost.MASTER_COPROCESSOR_CONF_KEY) && !maintenanceMode | ||
| ) { | ||
| LOG.info("Update the master coprocessor(s) because the configuration has changed"); |
There was a problem hiding this comment.
Might be nice to keep this logging?
| Consumer<Boolean> stateSetter, CoprocessorReloadTask reloadTask) { | ||
|
|
||
| boolean maybeUpdatedReadOnlyMode = ConfigurationUtil.isReadOnlyModeEnabled(newConf); | ||
| boolean hasReadOnlyModeChanged = originalIsReadOnlyEnabled != maybeUpdatedReadOnlyMode; |
There was a problem hiding this comment.
I think that if this method/class must know whether the read-only mode has changed, it should track that itself, and not depend on the caller to help track it. Maybe you could look at what coprocessors are currently loaded to find out whether read-only mode is enabled without explicitly tracking it.
| public void registerConfigurationObservers(ConfigurationManager configurationManager) { | ||
| Coprocessor foundCp; | ||
| Set<String> coprocessors = this.getCoprocessors(); | ||
| for (String cp : coprocessors) { | ||
| foundCp = this.findCoprocessor(cp); | ||
| if (foundCp instanceof ConfigurationObserver) { | ||
| configurationManager.registerObserver((ConfigurationObserver) foundCp); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Deregisters relevant coprocessors from the {@link ConfigurationManager}. Coprocessors are | ||
| * considered "relevant" if they implement the {@link ConfigurationObserver} interface. | ||
| * @param configurationManager the ConfigurationManager the coprocessors get deregistered from | ||
| */ | ||
| public void deregisterConfigurationObservers(ConfigurationManager configurationManager) { | ||
| Coprocessor foundCp; | ||
| Set<String> coprocessors = this.getCoprocessors(); | ||
| for (String cp : coprocessors) { | ||
| foundCp = this.findCoprocessor(cp); | ||
| if (foundCp instanceof ConfigurationObserver) { | ||
| configurationManager.deregisterObserver((ConfigurationObserver) foundCp); | ||
| } | ||
| } |
There was a problem hiding this comment.
It looks like none of your coprocessors implement ConfigurationObserver. Was this meant as speculative infrastructure, or left in by accident?
| CoprocessorConfigurationUtil.maybeUpdateCoprocessors(newConf, this.isGlobalReadOnlyEnabled, | ||
| this.cpHost, CoprocessorHost.MASTER_COPROCESSOR_CONF_KEY, this.maintenanceMode, | ||
| this.toString(), val -> this.isGlobalReadOnlyEnabled = val, | ||
| conf -> initializeCoprocessorHost(newConf)); |
There was a problem hiding this comment.
Use the captured conf
Hi all,
We would like to propose merging the feature “Read Replica Cluster” into
the main branch.
Background
We’d like to implement the open source version of Amazon’s Read Replica
Cluster on S3 feature for Apache HBase. It adds the ability of running
another HBase cluster on the same cloud storage location in read-only mode,
allowing users to share the read workload between multiple clusters. Due
to the characteristics of the implementation and the lack of automated
synchronization between the active and read-replica clusters, read replicas
are eventually consistent, hence they’re not suitable for reading most
recent data. However we still believe that users of open source Apache HBase
could take advantage of this feature and there are use cases out there which
read replicas could help with. Please find more information about the
feature in the linked blog post.
Pros
entire workload,
which is cost and time efficient,
Cons
visible from read replicas,
refresh hfiles/meta on read replicas
A detailed description of the design and implementation can be found in the
following document.
Apache HBase Read Replica Cluster Feature
Please review and share your feedback or comments.
Best regards,
Andor